A Family of Accelerators for Matrix-Vector Arithmetics Based on High-Radix Multiplier Structures
نویسندگان
چکیده
A methodology for designing processor architectures oriented to matrix-vector operations is proposed in this paper. The methodology is based on high-radix multiplication where first a list of potential partial products (PPs) of one operand with all possible t-bit numbers (t ∈ {2, 3, 4}) are computed by simple shifts and additions, then selected PPs from this list are shifted and added according to t-bit slices of the other operand. Main advantage of the proposed method is that the list of potential PPs may be reused whenever one multiplicand is to be multiplied with several multipliers. Another advantage is that the hardware blocks involved for high-radix multiplication may also be used independently to implement other tasks such as parallel addition/subtractions, accumulations. This allows introducing a group of modifications to high-radix multiplier structures making them reconfigurable so that single devices having two-fold functionalities of either programmable processors or reconfigurable hardware accelerators may be designed.
منابع مشابه
High speed Radix-4 Booth scheme in CNTFET technology for high performance parallel multipliers
A novel and robust scheme for radix-4 Booth scheme implemented in Carbon Nanotube Field-Effect Transistor (CNTFET) technology has been presented in this paper. The main advantage of the proposed scheme is its improved speed performance compared with previous designs. With the help of modifications applied to the encoder section using Pass Transistor Logic (PTL), the corresponding capacitances o...
متن کاملOptimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators
Hardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming language extensions (e.g., CUDA), profiling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. We present an optimized n...
متن کاملOptimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators
Hardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming languages (e.g., CUDA), profiling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. We present an optimized numerical k...
متن کاملA Hybrid Radix-4/Radix-8 Low Power Signed Multiplier Architecture
A hybrid radix-4/radix-8 architecture targeted for high bit, general purpose, digital multipliers is presented as a compromise between the high speed of a radix-4 multiplier architecture and the low power dissipation of a radix-8 multiplier architecture. In this hybrid radix4/radix-8 multiplier architecture, the performance bottleneck of a radix-8 multiplier, the generation of three times the m...
متن کاملDESIGN OF BOOTH ENCODED MODULO 2n-1 MULTIPLIER USING RADIX-8 WITH HIGH DYNAMIC RANGE RESIDUE NUMBER SYSTEM
A special moduli set Residue Number System (RNS) of high Dynamic Range (DR) can speed up the execution of verylarge word-length repetitive multiplications found in applications like public key cryptography. The modulo 2n-1 multiplier is usually the noncritical datapath among all modulo multipliers in such high-DR RNS multiplier. This timing slack can be exploited to reduce the system area and p...
متن کامل